Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Enable UMAP to build knn graph using NN Descent #5910

Merged
merged 55 commits into from
Jul 31, 2024

Conversation

jinsolp
Copy link
Contributor

@jinsolp jinsolp commented Jun 1, 2024

Description

  1. Adds build_algo=nn_descent option to UMAP.

Now user can choose the knn graph build algorithm between "brute_force_knn" and "nn_descent"
Defaults to "auto", in which case decides to run with brute force knn or nn descent depending on the given dataset size.

"auto" decides to run with brute_force_knn if either 1) data has <= 50K rows OR 2) data is sparse. Otherwise decides to run with nn_descent.

50K rows roughly chosen based on grid search below. (runtime in ms) - Discussed with Corey
Screenshot 2024-07-23 at 5 36 34 PM

X_embedded_nnd = cuUMAP(n_neighbors=16, build_algo="nn_descent").fit_transform(data)
score_nnd = cuml.metrics.trustworthiness(data, X_embedded_nnd)
  1. Adds data_on_host option (defaults to False) when calling fit() or fit_transform()

Note that brute force knn cannot be used with data on host

Running Benchmarks

Screenshot 2024-07-23 at 5 41 19 PM

@jinsolp jinsolp requested review from a team as code owners June 1, 2024 00:34
@github-actions github-actions bot added Cython / Python Cython or Python issue CUDA/C++ labels Jun 1, 2024
@jinsolp
Copy link
Contributor Author

jinsolp commented Jun 1, 2024

Oh the check is failing because we need the updated implementation of RAFT NN Descent (linked above)...!

Copy link
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just have a few suggestions. Also, it could be great to add this feature to the simplicial set functions. But, we could do that in another PR too.

cpp/src/umap/knn_graph/algo.cuh Outdated Show resolved Hide resolved
cpp/src/umap/knn_graph/algo.cuh Outdated Show resolved Hide resolved
python/cuml/manifold/umap.pyx Outdated Show resolved Hide resolved
python/cuml/cuml/manifold/umap.pyx Outdated Show resolved Hide resolved
cpp/src/umap/knn_graph/algo.cuh Outdated Show resolved Hide resolved
@jinsolp
Copy link
Contributor Author

jinsolp commented Jul 26, 2024

Added warning and linked issue for consistency with previous version

@jinsolp jinsolp requested a review from divyegala July 29, 2024 19:26
@dantegd dantegd added feature request New feature or request non-breaking Non-breaking change labels Jul 30, 2024
@dantegd
Copy link
Member

dantegd commented Jul 30, 2024

/ok to test

@dantegd
Copy link
Member

dantegd commented Jul 30, 2024

/ok to test

@dantegd
Copy link
Member

dantegd commented Jul 30, 2024

/ok to test

@divyegala
Copy link
Member

/ok to test

@dantegd
Copy link
Member

dantegd commented Jul 30, 2024

/merge

@divyegala
Copy link
Member

/ok to test

@divyegala
Copy link
Member

/ok to test

@divyegala
Copy link
Member

/ok to test

@dantegd
Copy link
Member

dantegd commented Jul 31, 2024

/merge

@rapids-bot rapids-bot bot merged commit 50d1a74 into rapidsai:branch-24.08 Jul 31, 2024
61 of 62 checks passed
@jinsolp jinsolp deleted the add-umap-nndescent branch July 31, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA/C++ Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants